Self-organized Hierarchical Softmax
نویسندگان
چکیده
We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on standard benchmarks for language modeling and sentence compression tasks. We find that this approach is as fast as other efficient softmax approximations, while achieving comparable or even better performance relative to similar full softmax models.
منابع مشابه
Strategies for Training Large Vocabulary Neural Language Models
Training neural network language models over large vocabularies is computationally costly compared to count-based models such as Kneser-Ney. We present a systematic comparison of neural strategies to represent and train large vocabularies, including softmax, hierarchical softmax, target sampling, noise contrastive estimation and self normalization. We extend self normalization to be a proper es...
متن کاملSelf-organized annealing in laterally inhibited neural networks shows power law decay
In this paper we present a method which assigns to each layer of a multilayer neural network, whose network dynamics is governed by a noisy winner-take-all mechanism, an approximated temperature β. This approximated temperature is obtained by comparison of a softmax mechanism where a temperature is well defined with the noisy winner-take-all mechanism. We apply this method to a multilayer neura...
متن کاملHierarchical Memory Networks
Memory networks are neural networks with an explicit memory component that can be both read and written to by the network. The memory is often addressed in a soft way using a softmax function, making end-to-end training with backpropagation possible. However, this is not computationally scalable for applications which require the network to read from extremely large memories. On the other hand,...
متن کاملExploration of Tree-based Hierarchical Softmax for Recurrent Language Models
Recently, variants of neural networks for computational linguistics have been proposed and successfully applied to neural language modeling and neural machine translation. These neural models can leverage knowledge from massive corpora but they are extremely slow as they predict candidate words from a large vocabulary during training and inference. As an alternative to gradient approximation an...
متن کاملLeaf-Smoothed Hierarchical Softmax for Ordinal Prediction
We propose a new approach to conditional probability estimation for ordinal labels. First, we present a specialized hierarchical softmax variant inspired by k-d trees that leverages the inherent spatial structure of (potentially-multivariate) ordinal labels. We then adapt ideas from signal processing on noisy graphs to develop a novel regularizer for such hierarchical softmax models. Both our t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.08588 شماره
صفحات -
تاریخ انتشار 2017